Penalized Regression for Genome-Wide Association Screening of Sequence Data

نویسندگان

  • Hua Zhou
  • David H. Alexander
  • Mary E. Sehl
  • Janet S. Sinsheimer
  • Kenneth Lange
چکیده

Whole exome and whole genome sequencing are likely to be potent tools in the study of common diseases and complex traits. Despite this promise, some very difficult issues in data management and statistical analysis must be squarely faced. The number of rare variants identified by sequencing is apt to be much larger than the number of common variants encountered in current association studies. The low frequencies of rare variants alone will make association testing difficult. This article extends the penalized regression framework for model selection in genome-wide association data to sequencing data with both common and rare variants. Previous research has shown that lasso penalties discourage irrelevant predictors from entering a model. The Euclidean penalties dealt with here group variants by gene or pathway. Pertinent biological information can be incorporated by calibrating penalties by weights. The current paper examines some of the tradeoffs in using pure lasso penalties, pure group penalties, and mixtures of the two types of penalty. All of the computational and statistical advantages of lasso penalized estimation are retained in this richer setting. The overall strategy is implemented in the free statistical genetics analysis software MENDEL and illustrated on both simulated and real data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Practical Issues in Screening and Variable Selection in Genome-Wide Association Analysis

Variable selection methods play an important role in high-dimensional statistical modeling and analysis. Computational cost and estimation accuracy are the two main concerns for statistical inference from ultrahigh-dimensional data. In particular, genome-wide association studies (GWAS), which focus on identifying single nucleotide polymorphisms (SNPs) associated with a disease of interest, have...

متن کامل

A two-stage penalized logistic regression approach to case-control genome-wide association studies

In this talk, we discuss a two-stage penalized logistic regression approach to case-control genome-wide association studies. This approach consists of a screening stage and a selection stage. In the screening stage, main-effect and interaction-effect features are screened by using L1 penalized logistic likelihood in a tournament procedure. In the selection stage, the retained features are ranke...

متن کامل

Association screening of common and rare genetic variants by penalized regression

MOTIVATION This article extends our recent research on penalized estimation methods in genome-wide association studies to the realm of rare variants. RESULTS The new strategy is tested on both simulated and real data. Our findings on breast cancer data replicate previous results and shed light on variant effects within genes. AVAILABILITY Rare variant discovery by group penalized regression...

متن کامل

Identifying significant gene‐environment interactions using a combination of screening testing and hierarchical false discovery rate control

Although gene-environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome-wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening-testing approach for G...

متن کامل

In Silico Genome-Wide Screening for TnrA-Regulated Genes of Bacillus clausii

Bacillus clausii TnrA transcription factor is required for global nitrogen regulation. In order to obtain anoverview of gene regulation by TnrA in B. clausii KSMK16, the entire genome of B. clausii was screened forthe consensus sequence, 5’-TGTNAN7TNACA-3’ known as the TnrA box, and 13 transcription units werefound containing a putative TnrA box. The TnrA targets identified in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2011